A Survey on Data Augmentation for Text Classification

نویسندگان

چکیده

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across disciplines. While it useful increasing model's generalization capabilities, can also address many other challenges and problems, from overcoming limited amount to regularizing objective, limiting used protect privacy. Based on precise description goals applications augmentation taxonomy existing works, this survey concerned with methods textual classification aims at providing concise comprehensive overview researchers practitioners. Derived taxonomy, we divide more than 100 into 12 different groupings give state-of-the-art references expounding which are highly promising relating them each other. Finally, perspectives that may constitute building block future work provided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Data Augmentation for Plant Classification

Data augmentation plays a crucial role in increasing the number of training images, which often aids to improve classification performances of deep learning techniques for computer vision problems. In this paper, we employ the deep learning framework and determine the effects of several data-augmentation (DA) techniques for plant classification problems. For this, we use two convolutional neura...

متن کامل

A survey on phrase structure learning methods for text classification

Text classification is a task of automatic classification of text into one of the predefined categories. The problem of text classification has been widely studied in different communities like natural language processing, data mining and information retrieval. Text classification is an important constituent in many information management tasks like topic identification, spam filtering, email r...

متن کامل

Real Data Augmentation for Medical Image Classification

Many medical image classification tasks share a common unbalanced data problem. That is images of the target classes, e.g., certain types of diseases, only appear in a very small portion of the entire dataset. Nowadays, large co llections of medical images are readily available. However, it is costly and may not even be feasible for medical experts to manually comb through a huge unlabeled data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Computing Surveys

سال: 2022

ISSN: ['0360-0300', '1557-7341']

DOI: https://doi.org/10.1145/3544558